Efficient Migration of Very Large Distributed State for Scalable Stream Processing
نویسنده
چکیده
Any scalable stream data processing engine must handle the dynamic nature of data streams and it must quickly react to every fluctuation in the data rate. Many systems successfully address data rate spikes through resource elasticity and dynamic load balancing. The main challenge is the presence of stateful operators because their internal, mutable state must be scaled out while assuring fault-tolerance and continuous stream processing. Both rescaling, load balancing, and recovering demand state movement among work units. Therefore, how to guarantee those features in the presence of large distributed state with minimal impact on the performance is still an open issue. We propose an incremental migration mechanism for fine-grained state shards through periodic incremental checkpoints and replica groups. This enables moving large state with minimal impact on stream processing. Finally, we present a low-latency hand-over protocol that smoothly migrates tuples processing among work units.
منابع مشابه
Alleviating Hot-Spots in Peer-to-Peer Stream Processing Environments
Many emerging distributed applications require the processing of massive amounts of data in real-time. As a result, distributed stream processing systems have been introduced, offering a scalable and efficient means of in-network processing. Managing however the load among the nodes of such a large-scale, dynamic system in real-time is challenging. The peer-to-peer paradigm can help address the...
متن کاملDistributed data stream processing and edge computing: A survey on resource elasticity and future directions
Under several emerging application scenarios, such as in smart cities, operational monitoring of large infrastructure, wearable assistance, and Internet of Things, continuous data streams must be processed under very short delays. Several solutions, including multiple software engines, have been developed for processing unbounded data streams in a scalable and efficient manner. More recently, a...
متن کاملLeveraging Distributed Publish/Subscribe Systems for Scalable Stream Query Processing
Existing distributed publish/subscribe systems (DPSS) offer loosely coupled and easy to deploy content-based stream delivery services to a large number of users. However, the lack of query expressiveness limits their application scope. On the other hand, distributed stream processing engines (DSPE) provide efficient processing services for complex stream queries. Nevertheless, these systems are...
متن کاملScalable Planning for Distributed Stream Processing Systems
Recently the problem of automatic composition of workflows has been receiving increasing interest. Initial investigation has shown that designing a practical and scalable composition algorithm for this problem is hard. A very general computational model of a workflow (e.g., BPEL) can be Turingcomplete, which precludes fully automatic analysis of compositions. However, in many applications, work...
متن کاملHigh-performance GRID Database Manager for Scientific Data
The GRID initiative provides an infrastructure for distributed computations among widely distributed high-performance computers. This will allow for exchanging and processing very large amounts of data. The LOFAR project (www.nfra.nl/lofar) is an international initiative to build a versatile, geographically distributed, multi-point radio facility for astrophysics, space physics, atmospheric phy...
متن کامل